FALSE DISCOVERIES OCCUR EARLY ON THE LASSO PATH By

نویسندگان

  • Weijie Su
  • Małgorzata Bogdan
  • Emmanuel Candès
  • Ma lgorzata Bogdan
چکیده

In regression settings where explanatory variables have very low correlations and where thereare relatively few effects each of large magnitude, it is commonly believed that the Lasso shall beable to find the important variables with few errors—if any. In contrast, this paper shows thatthis is not the case even when the design variables are stochastically independent. In a regimeof linear sparsity, we demonstrate that true features and null features are always interspersedon the Lasso path, and that this phenomenon occurs no matter how strong the effect sizes are.We derive a sharp asymptotic trade-off between false and true positive rates or, equivalently,between measures of type I and type II errors along the Lasso path. This trade-off states thatif we ever want to achieve a type II error (false negative rate) under a given threshold, thenanywhere on the Lasso path the type I error (false positive rate) will need to exceed a giventhreshold so that we can never have both errors at a low level at the same time. Our analysisuses tools from approximate message passing (AMP) theory as well as novel elements to dealwith a possibly adaptive selection of the Lasso regularizing parameter.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

False Discoveries Occur Early on the Lasso Path

In regression settings where explanatory variables have very low correlations and where thereare relatively few effects each of large magnitude, it is commonly believed that the Lasso shall beable to find the important variables with few errors—if any. In contrast, this paper shows thatthis is not the case even when the design variables are stochastically independent. In a regim...

متن کامل

Component-wise gradient boosting and false discovery control in survival analysis with high-dimensional covariates

MOTIVATION Technological advances that allow routine identification of high-dimensional risk factors have led to high demand for statistical techniques that enable full utilization of these rich sources of information for genetics studies. Variable selection for censored outcome data as well as control of false discoveries (i.e. inclusion of irrelevant variables) in the presence of high-dimensi...

متن کامل

Controlling false discoveries in Bayesian gene networks with lasso regression p-values

P-values are being computed for increasingly complicated statistics but lacking evaluations on their quality. Meanwhile, accurate p-values enable significance comparison across batches of hypothesis tests and consequently unified false discover rate (FDR) control. This article discusses two related questions in this setting. First, we propose statistical tests to evaluate the quality of p-value...

متن کامل

Split LBI: An Iterative Regularization Path with Structural Sparsity

An iterative regularization path with structural sparsity is proposed in this paper based on variable splitting and the Linearized Bregman Iteration, hence called Split LBI. Despite its simplicity, Split LBI outperforms the popular generalized Lasso in both theory and experiments. A theory of path consistency is presented that equipped with a proper early stopping, Split LBI may achieve model s...

متن کامل

Supplementary materials for Statistical Estimation and Testing via the Sorted `1 Norm

In this note we give a proof showing that even though the number of false discoveries and the total number of discoveries are not continuous functions of the parameters, the formulas we obtain for the false discovery proportion (FDP) and the power, namely, (B.3) and (B.4) in the paper Statistical Estimation and Testing via the Sorted `1 Norm are mathematically valid. We recall that these formul...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015